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Abstract 

Monte Carlo (MC) methods are widely used for Bayesian inference and optimization in statistics, signal processing and machine 
learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better 
exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC 
chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called orthogonal 
MCMC (O-MCMC), where a set of “vertical’' parallel MCMC chains share information using some ’’horizontal” MCMC techniques 
working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas 
the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and 
local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations 
of O-MCMC, novel schemes in order to reduce the overall computational cost of parallel multiple try Metropolis (MTM) chains 
are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel simulated 
annealing (SA) algorithms. Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the 
estimation, as well as robustness in terms of independence with respect to initial values and the choice of the parameters. 

©2011 Published by Elsevier Ltd. 

Keywords: Bayesian Inference; Optimization; Parallel Markov Chain Monte Carlo; Parallel Multiple Try Metropolis; Block 
Independent Metropolis; Parallel Simulated Annealing; Recycling samples. 


1. Introduction 

Monte Carlo (MC) methods are widely employed in different fields for Bayesian inference and stochastic opti¬ 
mization [1, 2, 3, 4]. Markov Chain Monte Carlo (MCMC) methods [5, 6, 4] are well-known MC methodologies to 
draw random samples and efficiently compute integrals involving a complicated multidimensional target probability 
density function (pdf), n(x) with x e 1) c W 1, . MCMC techniques only need to be able to evaluate the target pdf, 
but the difficulty of diagnosing and speeding up the convergence has driven intensive research efforts in this field. 
For instance, several adaptive MCMC methods have been developed in order to determine adequately the shape and 
spread of the proposal density used to generate candidate samples within an MCMC scheme [7, 8, 4, 9]. Nevertheless, 
guaranteeing the theoretical convergence is still an issue in most of the cases. Moreover, in a single specific (long) run, 
the generated chain can remain trapped in a local mode and, in this scenario, the adaptation could even slow down the 
convergence. Thus, in order to speed up the exploration of the state space, and specially to deal with high-dimensional 
applications, several schemes employing parallel chains have been recently proposed [2, 9], as well as multiple try and 
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interacting schemes [10]. However, the problem is still far from being solved. The interest in the parallel computation 
can be also originated by other motivations. For instance, several authors have studied the parallelization of MCMC 
algorithms, which have traditionally been implemented in an iterative non-parallel fashion, in order to reduce their 
computation time [11, 12], 

In this work, we focus on the implementation of parallel MCMC chains in order to foster the exploration of the 
state space and improve the overall performance. Computational speed up (as result of the parallelization) can be 
seen as an additional benefit of the proposed approach, but it is not the main goal of the paper. We introduce a 
novel scheme that considers a population of samples at each iteration, similarly to other population-based techniques 
[13, 14, 15, 3, 16, 17]. 1 More specifically, we present a novel family of parallel MCMC schemes, called orthogonal 
MCMC (O-MCMC) algorithms, where N different chains are independently run and, at some pre-specified iterations, 
they exchange information using another MCMC technique applied on the entire cloud of current states. Assuming 
that all the MCMC techniques used yield chains converging to the target pdf, the ergodicity of the global scheme is 
guaranteed: the whole kernel is still valid, since it is obtained as the multiplication of ergodic kernels with the same 
invariant pdf. Fixing the computational cost, the computing effort can be divided into N parallel processes but, at 
some iteration, information among the chains is exchanged in order to enhance the overall mixing. Let us remark also 
that the novel O-MCMC scheme is able to combine efficiently both the random-walk and the independent proposal 
approaches, as both strategies have advantages and drawbacks. On the one hand, random-walk proposal pdfs are 
often used when there is no specific information about the target, since this approach turns out to be more explorative 
than using a fixed proposal. On the other hand, a well-chosen independent proposal density usually provides less 
correlation among the samples in the generated chain. In the novel method, the parallel “vertical” chains (based on 
random-walk proposals) move around as “free explorers” roaming the state space, whereas the “horizontal” MCMC 
technique (applied over the population of current states and based on independent proposals) works as a “park ranger”, 
redirecting “lost explorers” towards the “beaten track” according to the target pdf. Unlike in [19, 20, 21, 22], the 
exchange of information occurs taking always into account the whole population of current states, instead of applying 
crossover or exchange schemes between specific pairs of chains. Tempering of the target pdf is not considered for 
sampling purposes but it is employed for optimization. Hence, our approach resembles the nonreversible parallel MH 
algorithms described in [23, 24], where the whole population of states is also updated jointly at the times of interaction, 
pursuing non-reversibility instead of tempering as a means to accelerate convergence towards posterior mode regions. 
However, both tempering and crossovers can also be easily implemented within the O-MCMC framework. 

Another important contribution of the work is the computational improvement provided by novel parallel im¬ 
plementations of MCMC techniques using multiple candidates at each iteration. We present two novel schemes for 
parallel Multiple try Metropolis (MTM) chains [10, 25, 26, 27, 28, 29] (and similarly to [12]) in order to reduce the 
overall computational cost in the same fashion of [11], saving generated samples, target evaluations and multinomial 
sampling steps. One of them is an extended version, using several candidates, of the Block Independent Metropolis 
presented in [11]. The ergodicity of both schemes is guaranteed. These novel parallel MTM techniques are employed 
as horizontal methods in O-MCMC. The corresponding O-MCMC scheme (using a novel parallel MTM method) can 
also be interpreted as an MTM algorithm employing an adaptive proposal density. This pdf is a mixture of N com¬ 
ponents: the adaptation of the location parameters of the N components is driven by the vertical parallel chains (note 
that the outputs of these chains are also used in the estimation). Furthermore, we describe a modified version of O- 
MCMC for solving optimization problems (where we employ tempering of the target), considering parallel Simulated 
Annealing algorithms [30, 31, 32] for the vertical movements. Numerical simulations show that O-MCMC exhibits 
both flexibility and robustness with respect to the initialization and parameterization of the proposals. 

It is also important to remark that, in literature, there is a great interested in proposing possible parallel implemen¬ 
tation of MCMC algorithms [33, 34, 35, 36, 37], distributing the computing in different parallel processors. However, 
it is not the goal of this work: we focus on suggesting a novel MCMC scheme which improves the performance w.r.t. 
other techniques, fixing the number of target density evaluations (similarly to [11, 12]). 

The paper is structured as follows. Section 2 summarizes the general framework and the aim of the work. Section 


'A preliminary version of this work has been published in [18]. With respect to that paper, here we propose several novel interacting schemes 
for exchanging information among the chains, analyze the theoretical basis of the proposed approach and discuss its relationships w.r.t. other 
techniques, in detail. Different variants are presented in order to reduce the overall computational cost and for applying O-MCMC in optimization 
problems. Furthermore, we provide more exhaustive numerical simulations. 
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3 describes the generic O-MCMC scheme, whereas Sections 4 and 5 provide different specific examples of vertical 
and horizontal movements, respectively. Section 6 discusses the O-MCMC framework for optimization and Section 
7 describes the connections with other techniques. Section 8 provides different numerical results. Finally, some 
concluding remarks are provided in Section 9. 


2. Bayesian inference problem 


In many applications, we aim at inferring a variable of interest given a set of observations or measurements. Let 
us denote the variable of interest by x e T> c M. d \ and let y e K d>: be the observed data. The posterior pdf is then 


tt(x) = p(x\y) = 


i(y|x)g(x) 

z( y) 


( 1 ) 


where £(y|x) is the likelihood function, g(x) is the prior pdf and Z(y) is the model evidence (a.k.a. marginal likelihood). 
In general, Z(y) is unknown, so we consider the corresponding unnormalized target function. 


tt(x) = £(y|x)g(x). 


( 2 ) 


In general, the analytical study of the posterior density 7f(x) oc 7r(x) is unfeasible (for instance, integrals involving 
7r(x) are typically intractable), and numerical approximations are required. Our goal is to approximate efficiently tt(\) 
employing a cloud of random samples. In general, a direct method for drawing independent samples from 7r(x) is 
not available and alternative approaches (e.g., MCMC algorithms) are needed. The only required assumption is being 
able to evaluate the unnormalized target function n(x). 


3. O-MCMC algorithms: General outline 

Let us consider N parallel vertical chains, (x„ ,)' v =| with t e N, generated by different MCMC techniques with 
random-walk proposal pdfs q n j(x) - q n (x |x„, f -i) = q„(x - x„ ,_i), i.e., x„,,_i plays the role of a location parameter for 
the proposal pdf used in the next iteration. Let us denote the population of current states at the f-th iteration as 

Pt = {Xi j,X 2 j, ... ,x N ',}. 

At certain selected iterations, we apply another MCMC technique taking into account the entire population of states 
V, |, yielding a new cloud of samples P t . In this “horizontal” transitions, the different chains share information. The 
horizontal MCMC technique uses a proposal pdf which is independent from the previous states, unlike the random 
walk proposals employed in the vertical MCMC chains. The general O-MCMC approach is represented graphically 
in Figure 1 and summarized below: 

1. Initialization: Choose the N initial states, 

n = {Xl,0,X2,0, • • • ,Xat.o}, 

the total number of iterations, T , and three positive integer values M,Ty,Tn e N\{0} such that M{Ty + Th) — T. 
Set t = 1. 

2. For : 

(a) Vertical period: For 

t — (m — 1)(7V + Th) + 1,..., mTy + (m - 1)7//, 

run N independent MCMC techniques, starting from x„ ,_ i e P, \, to obtain x nJ for n = 1,..., N, i.e., a new 
population of states P t = {xi j,X2 >f , ... ,x/v,/}. 

(b) Horizontal period: For 

t = mTy + (m - 1)7// + 1,..., m(Ty + Th ), 

apply an MCMC approach taking into account the entire population P t -\ to generate the next cloud P t . 

3. Output: Return the NT = NM(T V + T H ) samples contained in all the sets P t , for t = 1,..., 7. 

3 
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Figure 1. A graphical representation of the O-MCMC approach. After Ty vertical transitions, then Th horizontal steps are performed. 


Table 1. Notation for O-MCMC. 



In summary, one vertical period contains Tv iterations of the chains, whereas in one horizontal period we have Th 
iterations. Hence, given t = (m - l)(T v + T H ), after one cycle of vertical and horizontal steps we have t = m(T v + T H ). 
The total number of cycles (or epochs) 2 is then M — T J[ T „ ■ The ergodicity is guaranteed if the vertical and horizontal 
steps produce ergodic chains with invariant density 7r(x) (see Appendix A for further details). Table 1 summarizes 
the main notation of the paper and the connections of O-MCMC with other techniques are discussed in Section 7. In 
the following two sections, we introduce several examples of vertical and horizontal movements that lead to different 
O-MCMC algorithms. 

3.1. Key observation: burn-in and convergence 

In general, several authors have noted that there is not a clear advantage using independent parallel MCMC chains 
(IPCs) with respect to employing a single longer MCMC chain (fixing the number of evaluation of the target Ej) 
in terms of performance (e.g., see [4, 9, 20, 21, 38]). The reason is that all the shorter parallel chains can remain 
within their “burn-in” period, thus jeopardizing the global performance, whereas the single longer chain can reach the 
convergence. Thus, the preference between these two schemes depends on the specific problem [4, 38, 39, 40], 

The motivation behind O-MCMC is to take advantage of the aforementioned drawback of the IPCs scheme. Using 
IPCs we can discover different features of the target pdf in faster way with respect to the use of a single chain, since the 
different chains will typically concentrate on different areas of the target during the first iterations depending on their 
initialization. O-MCMC allows the exchange of information among the chains without jeopardizing their ergodicity 
(see Appendix A). This is particularly useful in multimodal, high-dimensional problems. For instance, using different 
chains, there are more chances to discover the two modes of the target n in Figure 2. The horizontal step of O-MCMC 

2 One cycle, or epoch, includes one the vertical period and one horizontal period. 
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Figure 2. A graphical representation of the key motivation behind the O-MCMC approach. A bivariate, bimodal target pdf 7r(x) = n(x i, X 2 ) is 
considered (shown its contour plot) and N = 2 independent chains are run (their trajectories are depicted with dashed lines), both becoming trapped 
in a different mode. The horizontal step in O-MCMC fosters the mixing of the two chains by the exchange of information. 


allows the communications between the two chains in Figure 2, fostering the identification of the other mode. Indeed, 
even if some chain is trapped around one mode, O-MCMC can still take advantage of this scenario by redirecting the 
other chains away from it, and the horizontal stage (which can be interpreted as an alternative to the use of resampling 
procedures [13, 14, 15]) will eventually cause this chain to move away from that mode. Finally, observe that by 
employing parallel chains it is possible to apply a diagnosis criterion in order to estimate the “burn-in” period, as 
already done by other authors [41, 42, 43, 44, 45]. This information can be employed in order to design adaptive 
strategies, as suggested in [9]. 

4. Vertical Movements 

In this section, we describe different implementations of the vertical parallel chains. Although it is not strictly 
necessary, we consider only random walk proposal densities in the vertical chains. The idea is to exploit predominantly 
the explorative behavior of the independent parallel MCMC methods. Therefore, we only consider proposals of the 
type = q n (x - x„, f _i). In this case, a sample x' ~ q n (x\x nJ -\) can be expressed as 

X — X„_,_ | + (3) 

where £ nt ~ q(^). Another more sophisticated possibility is to include the gradient information of the target within 
the proposal pdf, as suggest in the Metropolis-Adjusted Langevin Algorithm (MALA) [46], In this case, a sample 
x' ~ ^ H (x|x tt f_i) becomes 

x' = x„,,_i + ~ V log [7r(x ;if _!)] + Vi£ v , (4) 

where £„ if ~ q(£j) and V/(x) denotes the gradient of a generic function /(x). This second alternative can be particularly 
useful in high-dimensional spaces, although it inevitably increases the probability of the chain of becoming trapped 
in one mode of the target in a multi-modal scenario. Thus, the joint application of N parallel chains appears very 
appropriate in this scenario, since they can easier reach different modes of the target. Moreover, the application of the 
O-MCMC scheme facilitates the jumps among the different modes. 

Regarding the MCMC algorithm, note that the random walk proposal density q n (x |x„ if _i) can be applied within 
different MCMC kernels. The simplest possibility is using a Metropolis-Hastings (MH) algorithm [4]. For each 
n = 1,..., N and for a given time step t, one MH update of the n- th chain is obtained as 

1. Drawx' ~ ^„(x|x (! ,_!). 
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2. Set x n j = x' with probability 


a n = min 


7r(x')g„(x„, r _i|x') 

tt(X n j- I ) ( 7n(x X /L , | ) 


Otherwise (i.e., with probability 1 - a„) set x, u = x, i f _|. 


Many other alternative schemes can be used instead of MH kernel for the vertical chains. For instance, two particularly 
appealing alternatives are the Multiple Try Metropolis (MTM) [25, 28] and the Delayed Rejection Metropolis [47] 
techniques. 


5. Horizontal Movements 

As described above, after each iteration t of the vertical period, the vertical chains return a population P, = 
[xi,,..., Xa/ ( [. When t = mTy + (m - 1 )T H , with m e [1, ...,M], i.e., after Ty vertical transitions, then Th horizontal 
steps are performed. The purpose of these horizontal MCMC transitions is to exchange information among the N 
different chains, improving the global mixing. In the following, we consider two different general approaches for 
sharing the information among the chains: 

• In the first one, a population-based MCMC algorithm is applied. The states of the vertical chains contained in 
P, are used as the initial population. Furthermore, the population-based MCMC scheme takes into account all 
the current population for making decisions about the next population. 

• In the second one, named as mixture-based approach, the initial population P, is also used for building a 
suitable density i/r(x). This pdf <// is employed as proposal by the N parallel MCMC chains for yielding the next 
populations P t+ \ ,..., P,+t h ■ More specifically, in this work we suggest to construct <//(x) as a mixture of N pdfs, 
each one centered in x n j e P t . 

In the following we show one specific example of the population-based approach and three different versions of the 
mixture-based scheme. In all the different cases, for the horizontal movements we consider the use of independent 
proposal pdfs, unlike for the vertical ones, where we have used of random walk proposals. 

5.1. Population-based approach 

We consider a generalized target density. 


N 

7t g (xi ,..., x N ) OC Y\ x(x„), (5) 

n= 1 

where each marginal, n(x„) for n = 1,..., N and x„ e T) c W 1 ', coincides with the target pdf in Eq. (2). The idea 
is that the horizontal MCMC transitions leave invariant the extended target n g . Namely, after a “burn-in” period, the 
population P t = [xi,,..., x^,} is distributed according to n g . The simplest possible population based scheme consists 
of employing a standard Metropolis-Hastings (MH) algorithm directly in the extended domain, D N c M. dxXN , with 
a target n g , generating (block) transitions from P, to P t+ \. However, the probability of accepting a new population 
in this case becomes negligible as N grows. As an alternative example of a population-based scheme, we consider 
the Sample Metropolis-Hastings (SMH) method [39, Chapter 4]. At each iteration, the underlying idea of SMH is 
replacing one “bad” sample in the population with a “better” one, according to a certain suitable probability. The new 
sample, candidate of be incorporated in the population, is generated from and independent proposal pdf iAx). The 
algorithm is designed so that, after a “burn-in" period t h , the elements in P,> (f > t h ) are distributed according to tt„ in 
Eq. (5). Table 2 provides a detailed description of the SMH-based horizontal transitions. 

The acceptance probability, 0 < a < 1, depends on the entire population, x„j-i for n = 1,..., N. and the new 
candidate sample, x ( ) , |. At each step, the sample chosen to be replaced is selected according to a probability pro¬ 
portional to the inverse of the corresponding importance weight. The ergodicity can be proved by considering the 
extended density n g as the target pdf (see Appendix D). Let us remark that the difference between P t and P,+\ is at 
most one sample. For this reason, a suggestion for a robust implementation is to set T h > N (so that all the samples are 
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Table 2. Sample Metropolis-Hastings (SMH) algorithm for horizontal transitions in O-MCMC. 


1. For t = mTy + (m - 1)7// + 1,... ,m(Ty + 7//): 

(a) Draw x 0 , f -t ~ <p(x). 

(b) Choose a "bad" sample X/j_i s P t ~i, i.e., select an index k e {1, TV} with probability proportional to the inverse of the 
importance sampling weights 

v(xu-i) 

k = 1 


n = 




y(x„,,-i) 


(c) Accept the new population P, = {x n j} N =1 , with x, i( = x, ,_i for all i ± k and x/_, = xo if _i, with probability 

y N y(X,,j-1) 

^77=1 /r(X ;? ,_l) 


0 <!“ 


( 6 ) 


Otherwise (i.e., with prob. 1 - a) set P t = P t - 1 - 


potentially replaced), although it is not strictly required as shown in Section 8 . Moreover, it can be convenient to use 
in the estimation only the last population P t +T H (excluding the sets among V, and V, , 7 „, generated in the horizontal 
step). 

Finally, note also that the SMH algorithm becomes the standard MH method for N - Hence, for /V = 1 the 
specific O-MCMC implementation using SMH consists of applying alternatively two MH kernels with different types 
of proposals: a random walk proposal, g„(x|x„ ,_i), and an independent one, i/>(x). This a well-known scheme (cf. 
[4, 39]), which can be seen as a particular case of the O-MCMC family of algorithms. 

5.2. Mixture-based approach 

An alternative approach is defining the following mixture of pdfs, which is updated every T v vertical transitions, 

1 N 

= lfl,„(x\<P t ) = - Yj (p„(x |x„, t ), (7) 

n= 1 

where t = mTy + (m - I)7)/, m = 1 and each x„ jf e V, plays the role of the location parameter of the «-th 

component of the mixture, <p„. It is important to remark that each component tp n is a density arbitrarily chosen by 
the user, defined in D (it can be even a mixture itself). Observe that i//(x) changes from one horizontal period to 
the next one (since it depends on the final population of the vertical period), but then it remains fixed within the 
T h iterations of each horizontal period. Thus, during the complete O-MCMC run we employ M different mixtures, 
if/i,..., iJ/M, one for each horizontal period. However, in order to simplify the notation, we use t//(x). Figure 3 provides 
a graphical representation. We employ if/(x) within N independent MCMC schemes as an independent proposal 
density, namely independent from the previous state of the chain. The underlying idea is using the information in V,. 
with t = mTy + (in - 1)7//, to build a good proposal function for performing N independent MCMC processes. The 
theoretical motivation is that, after the burn-in periods, the vertical chains have converged to the target, so x n t ~ n(x) 
for n = 1 ,... ,N. Then, i//(\) in Eq. (7) can be interpreted as a kernel density estimation of n, where tp„ play the role 
of the kernel functions. 

5.2.1. Basic schemes 

As a first example of this strategy, we consider the application of MH transitions. At each iteration t = mTy + 
(in - 1 )T h + 1,..., m(Ty + T H ). one sample x' is generated from t//(x) and then N different MH tests are performed. 
The procedure is shown in Table 3 and represented in Figure 4. Alternatively, a different sample x', drawn from <//(x), 
can be tested for each chain, as shown in Table 4. Hence, N different samples are drawn at each iteration (instead of 
only one) but, after building i//(x\'P t ). the process could be completely parallelized. The variant in Table 4 provides in 
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Figure 3. A graphical representation of the mixture-based strategy. The mixture ijj(x) is formed by N components, ^„(x|x„ r ), where x n j e P, plays 
the role of a location parameter. Note that each component ip n can be any kind of density defined in T), even a mixture itself. 



■ i 

i _ 

t -i t 


>• 


Figure 4. A schematic representation of the basic horizontal scheme described in Table 3. One specific transition of one specific chain is represented 
with the probability a„ = oj(x') A oAx n j i). where oj(x) = showing the two possible future states at the t-th iteration, of the n-th chain. 


general better performance, although at the expense of a increasing computational cost in terms of evaluations of the 
target and number of generated samples. However, the block independent MH methodology [11], proposed in order to 
reduce the computational effort by recycling generated samples and target evaluations, can be employed. For clarifying 
that, let us consider for simplicity T h = N. Step 2(a) in Table 3 could be modified by drawing only N independent 
samples xj,... ,x' N from t//(x) and, at each iteration t, a different circular permutation of the set {xj,..., x' v ) could be 
tested in the different N acceptance MH tests 3 . Note that, the scheme in Table 3 yields dependent chains, whereas the 
algorithm in Table 4 produces independent chains (the interaction, in this case, is only contained in the construction 
of the mixture i/z at the beginning of the horizontal period). Finally, observe that the procedure in Table 3 presents 
certain similarities with the Normal Kernel Coupler (NKC) method introduced in [48], thus indicating that NKC-type 
algorithms can be also employed as alternative population-based approaches. 

5.2.2. Schemes based on multiple candidates 

More advanced techniques can also be modified and used as horizontal methods. More specifically, the adaptation 
to this scenario of multiple try schemes is particularly interesting. For instance, we adjust two special cases 4 of the 
Ensemble MCMC (EnM) [49] and Multiple Try Metropolis (MTM) methods [25, 40, 28] to fit them within O-MCMC. 
Tables 5 and 6 summarize them. Note that standard parallel EnM and MTM chains can be considered. However, we 
suggest two variants in order to reduce the computational cost. In both cases, L > 1 different i.i.d. samples, Z],..., Z/,, 
are draw from i//(x). In the parallel Ensemble MCMC (P-EnM) scheme, at each iteration t, one resampling step per 
chain is performed, considering the set of L + 1 samples ]zi,... j, n = 1,... ,N, using importance weights. 

In the parallel MTM (P-MTM) scheme, at each iteration t, N resampling steps are performed considering the set of L 


3 For further clarifications, see the extension of this scheme for a Multiple Try Metropolis method described in Section 5.2.3. 
4 They are special cases of the corresponding algorithms, since an independent proposal pdf <// is used. 
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Table 3. Basic mixture scheme for horizontal transitions in O-MCMC. 


1. Build ij/(x) = if/ m (x\P t ) as in Eq. (7), where t = mTy + (m - 1)7//. 

2. For t = mTy + (m - 1)77/ + 1, • • • ,ni(Ty + ThY 

(a) Draw x' ~ if/(x). 

(b) Forrc = 1,... ,7V: 

i. Set x n j = x', with probability 



a n = min 

"j tr(x')<A(x„,,_i) 

’ ^fx„,,_i)iA(x') 


= Oj(x') A (jj(X n /-l), 


where a>(x) = and a Ab = min [a, b], for any a. b e K. Otherwise, set x nJ 

(c) Set = {xi /,..., XjvA 

— 1 • 


Table 4. Variant of the basic mixture scheme for horizontal transitions in O-MCMC. 


1. Build if/(x) = tf/ m (x\P t ) as in Eq. (7), where t = mTy + (m - 1 )77/. 

2. For t = mTy + (m - 1)77/ + 1, • • • ,m(Ty + ThY 
(a) For n = 1,..., N: 

i. Draw x' n ~ if/(x). 

ii. Set x n j = x' n , with probability 




min 


1 7{(x' n )l//(X n f-\) 
' 7T(X„ j( _l)^(x^) 


CO(x' n ) A (0(X„J- 1 ), 


where a)(x) = and a A b = min [a, b\. for any a,b el. Otherwise, set x„ t , = x ;J/ i. 
(b) Set P, = (xi j(> ..., Xnj}. 


candidates {zi,..., //) and the new possible states are tested (i.e., accepted or not) according to suitable acceptance 
probabilities a„, n = involving also the previous states x„,_i. Another alternative and similar technique has 

been presented in [12], and it is described in Appendix C. This variant uses a non-independent proposal pdf and can 
be employed as horizontal step. 

The ergodicity of both schemes is discussed in Appendix E. The algorithms in Tables 5-6 are obtained by a 
rearrangement of the basic schemes in [49, 25, 40] in order to generate, at each iteration r, N new states for the 
N independent parallel chains. The new states of the N chains are selected by filtering the same set of candidates 
[zi ,..., Z/.), drawn from the same independent proposal pdf \j/. Note that, with respect to a standard parallel approach, 
they require less evaluations of the target pdf: at each iteration, the algorithms in Tables 5-6 require L new evaluations 
of the target instead of the NL target evaluations required by a standard parallel approach. For further explanations, 
see Appendix E.1.1 and Figure 10. With L - 1, the algorithm in Table 5 coincides with the application of N parallel 
MH methods with Barker’s acceptance rule [50]. The algorithm in Table 6 with L = 1 coincides with the scheme 
presented in Table 3. Although any L > 1 can be employed, a number of tries L > N is suggested. Note that another 
important difference with respect to the standard parallel implementation is that the generated chains are no longer 
independent. 

5.2.3. Block Independent Multiple Try Metropolis algorithm 

Previously, we have pointed out that with the scheme in Table 6 only L evaluations of the target are required at each 
iteration, instead of NL as in the standard parallel approach. The proposed scheme in Table 6 can also be modified in 
the same fashion of the block independent MH method [11], in order to reduce the number of multinomial sampling 
steps, without jeopardizing the ergodicity of the parallel chains. We remark that the corresponding technique, called 
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Table 5. Parallel Ensemble MCMC (P-EnM) scheme for horizontal transitions in O-MCMC. 



Table 6. Parallel Multiple Try Metropolis (P-MTM) scheme for horizontal transitions in O-MCMC. 



Block Independent Multiple Try Metropolis (BI-MTM), can always be employed when N parallel independent MTMs 
are applied (even outside the O-MCMC scheme) in order to reduce the overall computational cost. Let us assume that 
the value N is such that the number of total transitions of one chain, T H , can be divided in B — jjf- e N blocks. The 
idea is based on using N circular permutations of the resampled set {zg l ,..., Z/, v }, i.e., 

'Vl = {Vl,l = Zh, . . • , Vjv-l,i = Zjfc^.VAU - z k N }» 

*V2 = {v 1.2 = z k N , ■ ■ ■ , Vjv-1,2 = z k N -2 ’ y N,2 = z k N -1 ), 

( 12 ) 

'Vn = {Vl,Af = Zyt 2 , . . . , \N-\,N — z k N ,'/N,N ~ Z*,}, 

where each set 'V n denotes one the N possible circular permutations of {z^,..., z/ f „). In order to preserve the ergodic- 
ity, each z*. is drawn from a different set of tries Sj = {Zj ,..., z^}. More specifically, before a block of N iterations, 
NL tries are drawn from i//(x ), yielding N different sets, Sj = {z^\ ..., z^) for j = each one containing L 

10 
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One Block in BI-MTM 



i i i 

—- --- ! - ► 

t — 1 t t + N 


Figure 5. A graphical representation of one block within the BI-MTM technique, described in Table 13. One specific transition of one MTM chain 
is represented with the probability a n (x n j- 1 , v„j), showing the two possible future states at the t -th iteration, of the n -th chain. One block is formed 
by N transitions. 


elements. Then, one sample z*. is resampled from each S, with probability proportional to the corresponding impor¬ 
tance weight, and the circular permutations in Eq. (12) are created considering {z kl , Z/ ;v [. The complete BI-MTM 
algorithm is detailed in Table 13 and further considerations are provided in Appendix E. In Table 13, we have denoted 
the acceptance probability as a n (x nk -\ , v„j) to remark the two possible future states of the n-th chain at the r-th itera¬ 
tion. Figure 5 depicts a schematic sketch of the different steps of one block within the BI-MTM algorithm. Moreover, 
Figure 10 provides a graphical comparison among different parallel MTM approaches. BI-MTM requires only N 
multinomial sampling steps for each block, i.e., N iterations, instead of N 2 as P-MTM in Table 6. Moreover, BI-MTM 
is completely parallelizable. Indeed, one could draw NLT H samples from i//(x), perform NT H multinomial sampling 
steps within NTh different sets, and then run the 7)/ parallel iterations of the N chains, i.e., one unique block, using 
circular permutations of the NTh resampled tries (previously obtained). The reduction in the computational cost is 
obtained at the expense of a moderate decrease in performance. 

5.3. Computational cost 

In general, the most costly steps are those requiring the evaluation of the target pdf, especially for complex models 
or a large number of data. The number of evaluations of the target, in one horizontal period, are E H = T H for 
SMH in Table 2, whereas E H = LT H in P-EnM and P-MTM (considering, in all cases, only the new evaluations at 
each iteration, the others can be automatically reused). Using SMH, Th multinomial sampling steps are performed, 
each one over a population of N samples. In P-EnM and P-MTM, NTh multinomial sampling steps are required 
(with N > 1), each one over a set of L samples. The total number of evaluations of the target, Ej = M(E v + Eh ), 
including the vertical transitions, is Et - M(NT V + Th) when the SMH is employed in the horizontal steps, or 
E T = M(NT V + LT h ) when P-EnM and P-MTM are employed. Furthermore, in BI-MTM, we have again E T = 
M(NTy + LTh), but only T H multinomial sampling steps. Note also that in a standard parallel multiple try approach 
we would have Eh — NLTh evaluations of the target and NTh multinomial sampling steps, each one over a set of L 
samples. Finally, we remark that, using SMH, we perform one acceptance test in each step, i.e., Th in one horizontal 
period. Using a multiple candidates scheme, we employ NTh acceptance test in one horizontal period. All these 
considerations are summarized in Table 7. For further details and observations, see Appendix E.1.1. 

5.4. Communication cost 

Fet us consider briefly now the development of a truly parallel implementation of O-MCMC that can be distributed 
across different processors/machines. The vertical steps of O-MCMC can be clearly parallelized. However, O-MCMC 
needs a fusion center in order to perform the horizontal steps. In the mixture-based approach, i.e., O-MCMC-PMTM, 
the whole population of current states V, = {x (Lr )'' =| must be transmitted to this fusion center. If the fusion is performed 
after each vertical iteration, i.e., Ty — 1, then some states, x„, e V,, are likely to remain unchanged from the previous 
horizontal step, and thus only certain new (possibly high-dimensional) vectors x„ , have to be transmitted to the fusion 


11 







/ Digital Signal Processing 00 (2016) 1-31 


12 


Table 7. Computional cost of O-MCMC given different horizontal schemes. Recall that the number of epochs is M - 


T 

Tv+Th ‘ 


Computational features 

SMH 

P-EnM and P-MTM 

BI-MTM 

Stand. Parallel MTM 

Eh 

Th 

LT h 

LT h 

NLT h 

Ej — M(Ey + Efj) 

M(NT V + T h ) 

M(NT V + LT h ) 

M(NT V + LT h ) 

M(NT V + NLT h ) 

Total number of 
multinomial sampling steps 

MT h 

MNT h 

MT h 

MNT h 

Cardinality of set for 
the multinomial sampling 

N 

L 

L 

L 

Total number of 
acceptance tests 

M(NT V + T h ) 

M(NT V + NT h ) 

M(NT V + NT h ) 

M(NT V + NT h ) 


center (indeed, the rest of states have been already transmitted to the fusion center in the previous horizontal step). In 
other cases, quantization and differential transmission strategies may alleviate the communication cost. 

Note that this communication problem also occurs in many other state of the art algorithms, although it can 
be reduced through a proper design of the algorithm. For instance, in population-based techniques that employ 
resampling procedures [13, 15], only the scalar importance weights have to be transmitted and, after the resampling 
stage, the fusion center can simply return the indices of the resampled particles. In our O-MCMC-SMH, we can 
follow the same strategy, transmitting only the scalar importance weights, as in [13, 15]. After Th steps of SMH, the 
fusion center returns the novel states to the corresponding chains, that can be identified simply through an index. 

However, in other more sophisticated schemes that construct the importance weights by considering the so-called 
deterministic mixture approach [3, 14, 51], the entire set V, must be transmitted, as in O-MCMC-PMTM. Similarly, 
the technique proposed in [12] and described in Appendix C, requires the knowledge of the L candidates for the 
computation of the weights in Eq. (26). Finally, in the MCMCMC (MC 3 ) method [20, 21], the communication cost is 
reduced w.r.t. O-MCMC by applying exchanges of particles between specific pairs of chains, whereas in the particle 
island approach [52] local resampling stages (which only require a subset of particles) are usually performed, with 
a global resampling stage (that requires all the particles) being performed only occasionally. This kind of strategies 
could be easily incorporated to the O-MCMC framework in order to enhance its distributed implementation. 

5.5. Joint adaptation of the proposal densities 

Let us denote as C„ and A„ the covariance matrices of the vertical and horizontal proposal pdfs, respectively. In 
order to design an algorithm as robust as possible, we suggest keeping the scale parameters C„ fixed for the vertical 
proposal pdfs g„(x|x„ ,_i, C„), to avoid a loss of diversity within the set of chosen variances. However, if desired, 
they could be easily adapted as suggested in [9], On the other hand, we suggest adapting the scale parameters of the 
horizontal proposal pdfs tp„, n = 1,... ,1V, since it is less delicate. Indeed, let us recall that a poor choice of the ipf s 
entails an increase in the computational cost, but the diversity in the cloud of samples is always preserved. Several 
strategies have been proposed in [7, 53] and [9], for adapting proposal functions online within MCMC schemes. For 
the sake of simplicity, we discuss separately the cases of the population-based or the mixture-based approaches. 

• Adaptation within SMH: in this case, the strategies in [53, 9] are appropriate. Thus, After a training period 
Ttrain < T, all the generated samples (i.e., for each t > T rraln and from all the chains) can be used to adapt the 
location and scale parameters of the proposal pdf tp(x). Namely, denoting ip,(x) = ip(x;p t , A,), we can use the 
following approach: 

- If t < T tra i „: set //, = po, A, = Ao (where po and Ao are the initial choices). 

- If t > Ttram- set p t = jf, 2;= i 2^=i x n,j, and A, = A Yj‘j= 1 2^=i ( X «J “ Ht)( x n,j - /h) T + C, where C is a 
chosen covariance matrix. The empirical mean and covariance matrix estimators can also be computed 
recursively [7]. 

• Adaptation of the mixture f(x): the methods in Section 5.2 employ a mixture if/(x) = ^ Y,^ = i <Pn(*)- In this case, 
every component 


<PnA x ) = 
12 
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should be adapted, jointly with the weights of the mixture. A possible (and simple) adaptation scheme is 
provided in [7], where all the parameters of the mixture are updated online. The method in [7] can be easily 
reformulated for a framework with parallel chains. In this case, the states of the parallel chains are divided 
into N different clusters according to the Euclidean distance between them and location parameters of the 
N components in the mixture i//(x). Then, new centroids (i.e., location parameters), covariance matrices and 
weights are updated according to the mean, covariance and cardinality of each cluster, respectively. 

Finally we remark that, within the O-MCMC framework, it is straightforward to apply the well-known diagnostic 
criteria in[41, 42, 43, 44, 45] in order to estimate the “burn-in” period, and hence to help the design of the adaptation 
scheme [9], 

6. O-MCMC for optimization 

The O-MCMC schemes can be easily modified converting them in stochastic optimization algorithms. Indeed, it 
is possible to replace the N vertical MH chains with N parallel simulated annealing (SA) methods [30, 31]. Let us 
denote as y nJ e (0, +oo) a finite scale parameter that is a decreasing function of r, approaching zero for t —> +oo, i.e., 

[ Yn.! ^ Yn,t+ 1 ^ ^ YnJ+r ^ 0 , 

| lim y, u , = 0, ^ 13 ' ) 

V t—>+oo 

for n = 1 ,N. Moreover, for the sake of simplicity, we consider symmetric proposal functions q„( y|x) = g„(x|y). 
Then, one transition of the n-th SA is described below: 

1. Draw x' ~ q n (x |x„,,_i). 

2. Set x n j = x' with probability 


a n = min 1, 


[tKx')] 

WViF 


= min 1, 


ftfa') 

ftfan,t— 1) 



Otherwise, i.e., with probability 1 - a„, set x nJ = x, if _|. 


Note that, with respect to the MH algorithm, we have replaced the target 7r(x) > 0 with a modified target [tt(x)] ?».< > 0, 
with modes that become sharper and narrower when we reduce the scale parameter y nf . Note also that the move¬ 
ments such 7r(x') > 7 t(x„,,_i) are always accepted, whereas movements leading to 7t(x') < 7r(x„ jf _i) are accepted with 
probability 


Pd 


ftfa') 
ft fan,t— l) 


7n,t 

e«U], 


This probability Pj —> 0 vanishes to zero as y nJ —» 0 (guaranteeing the convergence to the global maximum when 

1 

t —> +oo). In the same fashion, the modified target [/r(x)] ?».< is employed in the horizontal transitions of the mixture- 
based approach, whereas for the horizontal steps of the population-based approach we consider the modified extended 
target, 

N 

7T g (X U ... ,X N ) OC |~”[[7!-(X„)]^, (14) 

n= 1 


so that all the presented schemes, previously described, can be automatically applied. Several possible decreasing 
functions y iu have been suggested in [30, 32, 31]. For sampling and optimization purpose, instead of using an artifi¬ 
cial sequence of auxiliary parameters y nJ , y„ J+ \,..., y„ if+T , an alternative is to use the so called “data point tempered" 
techniques [54] where a sequence of P posteriors, 7Ti(x), 7T2(x),...,7rp(x), with an increasing number of data, are con¬ 
sidered (typically, for sampling purpose the last one contains all the data, i.e., 7i>(x) = 7r(x)). 
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7. Relationship with other techniques 

First, we recall that in this work we focus on population-based Monte Carlo schemes designed in order to foster 
the exploration of the state space and improve the overall performance. Note that the techniques shown in Table 5 and 
6 are interesting since they involve the use of resampling steps without jeopardizing the ergodicity of the resulting 
global O-MCMC process. Moreover, the SMH algorithm in Table 2 employs an inverted resampling scheme, since a 
sample in the population is chosen to be replaced with probability proportional to the inverse of its importance weight. 
Other methodologies in the literature employ a combination of MCMC iterations and resampling steps. An example 
is sequential Monte Carlo (SMC) sampler for a static scenario [15] described in Appendix B.2. The underlying idea 
could be interpreted belonging to the O-MCMC philosophy: in these methodologies, the resampling steps are seen 
as a “horizontal” approach for exchanging information within the population. The resampling procedure generates 
samples from a particle approximation 

L 

; : (L \x) = J]/3 e 6(x-z e ), (15) 

e=i 

of the measure of 7 r(x), where Z{ ~ i//(x) (or, similarly, Z( ~ qe(x) [51]) and (it are defined in Eq. (10) in Table 6 , 
with £ = 1 ,,L. The quality of this approximation improves as the number L of samples grows. However, for a 
finite value of L there exists a discrepancy which can produce problems in the corresponding sampling algorithm. For 
further details see Appendix B. One important issue is the loss in diversity in the population. 

This problem is reduced in O-MCMC, since the ergodicity is ensured in both the vertical and the horizontal 
movements. This improvement in the performance is obtained at the expense of increasing the computational cost. 
For instance, let us consider the use of SMH in horizontal transitions. The cloud of samples is not impoverished by the 
application of SMH, even if a poor choice of the proposal tp(x) is made. In the worst case, the newly proposed samples 
are always discarded and computational time is wasted. In the best case, a proposal located in a low probability region 
can jump close to a mode of the target. Clearly, in the mixture multiple try approach, it is better to choose L > N 
for fostering the safeguard of the diversity. Moreover, in the mixture approach, the mixture i//(x) = i(/(x\ V,) is built 
using the states in V t as location parameters, and then it does not change for the next Tu horizontal steps. Thus, the 
information contained in the states {x„ ( }' v =| € V, is employed in the next Th iterations even if some states are not 
well-located. For clarifying this point, consider for instance the basic scheme in Table 3. The mixture if/(x) = i//(x\P,) 
does not change, so the information provided by the population V, = jx| f ,..., x,y,) at the iteration t is still used in the 
iterations t + -T h . This feature is also the main difference between the scheme in Table 3 and the NKC-type 

methods [48], where one component of the mixture is relocated after each iteration. Unlike the MCMCMC (MC 3 ) 
method [19, 20, 21, 22], in O-MCMC the exchange of information occurs taking always into account the whole 
population of current states, instead of applying exchanges between specific pairs of chains. Similarities with the 
technique proposed in [12] are discussed in Appendix C. 

8. Numerical simulations 

8.1. Multimodal target distribution 

In this section, we consider a bivariate multimodal target pdf, which is itself a mixture of 5 Gaussian pdfs, i.e., 

1 5 

7 r(x) = 7 r(x) = - ^ A((x; Vi, G,), x e R 2 , (16) 

^ i= 1 

with means V\ = [-10,-10] T , v 2 = [0,16] T , v 3 = [13,8] T , V 4 = [—9,7] T , and vg = [14,-14] T , and with covariance 
matrices 

Gi = [ 2 , 0 . 6 ; 0 . 6 , 1 ], G 2 = [ 2 , -0.4;-0.4, 2 ], 

G 3 = [2, 0.8; 0.8, 2], G 4 = [3, 0;0, 0.5], and 
G 5 = [2, -0.1;-0.1, 2]. 
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Then, the target pdf k(x) has 5 different modes. We apply O-MCMC to estimate the expected value 7?[X] of X ~ 7 r(x) 
(the true mean is £[X] = [1.6,1.4] T ) using different values for the number of parallel chains N e {5,100,1000). 
Furthermore, we choose deliberately a “bad” initialization to test the robustness of the algorithm. Specifically, we set 
x„ o ~ 7/([—4,4] X [-4,4]) for n = 1,... ,1V. This initialization is “bad” in the sense that it does not cover the modes 
of 7 r(x). In all cases, we consider MH vertical kernels, with q n (x |x„,,_i) = /V(x;x„, |,C„) as proposal pdfs, using the 
same isotropic covariance matrix, C„ = <x 2 I 2 , for all n = N. We test different values of cr e [2,5,10,70} to 
gauge the performance of O-MCMC. In O-MCMC, we consider the application of SMH and P-MTM as horizontal 
techniques, as described below. In both cases, we adapt the covariance matrices of the proposal pdfs as suggested in 
Section 5.5. 

• O-MCMC with SMH: As horizontal proposal, we use again a Gaussian pdf, tp t (x) = AC(x; //,. A,) where /i, 
and A, are adapted online: namely, /i, = A £'- =1 x «j, and A, = A Yl ]= \ 2^=i( x «j ~ /h)( x «j - Etf + A 0 , 
where /in = [0,0] T , Ao = A 2 I 2 with A = 2. 5 As remarked in Section 5.5, this adaptive procedure is quite robust 
since employs samples generated by different parallel chains [9]. Furthermore, we fix T = 4000 and Th — Tv- 
We test different values of TV e [1,100} and, as a consequence, M = t J +Th = jf - e [20,2000). 6 Recall that 
the total number of evaluations of the targets in O-MCMC with SMH is Ej = M(NT V + Th) = §(A + 1) (see 
Section 5.3). 

• O-MCMC with P-MTM: We also test the O-MCMC scheme with P-MTM as horizontal technique. In this 
case, the (independent) proposal pdf is the mixture i//(x) - t/r,„(x|!P,) — jj ^n( x |x n ,f, A,) with t = mTy + 
(m - 1)77/, A, = A Yl j= i Z£Li( x nj - /F)( x «,j - EtV + A 0 , where n, = A £'. =1 £^ =1 x nJ , A 0 = 4I 2 (for all 
n - 1,..., N). We consider different number of tries L = {5,50} and set again Ty - Th- In this case, the 
number of evolution of the target is Ej — M(NTy + LTh) - j(N + L) (see Section 5.3). 

Comparison with independent parallel chains (IPCs). We compare the performance of O-MCMC with the appli¬ 
cation of IPCs, namely, only vertical independent transitions (also in this case the initial state is chosen randomly 
for each chain at each run, i.e, x„ o ~ 7/(1 -4,4] x [-4,4]) for n = 1,..., N). Therefore, we can infer the benefit of 
applying the horizontal interaction. For a fair comparison, in IPCs we use the same MH kernels, i.e., with the same 
proposals q n ' s, and we keep fixed the total number of evaluations of the target E r in both cases, O-MCMC and IPCs. 
Note that Ej = NT' in IPCs where N is the number of chains and T' the total number of iterations for each one. We 
test different values of N. Tables 8 and 9 show the Mean Square Error (MSE), averaged among the two dimensions, 
in the estimation of the expected value £[X] = [1.6,1.4] T , averaged over 200 independent runs. O-MCMC with 
SMH always outperforms IPCs, specially for small cr and N. O-MCMC shows a much more stable behavior w.r.t. the 
parameter choice cr. For large scale parameters (cr e {10,70}) and a large number of chains (N e [100,1000}), the 
MSE of IPCs approaches the MSE of O-MCMC. A possible explanation is that the interaction is particularly useful 
with small N and a wrong choice of cr, whereas the use of large number of chains such as N = 100 or N - 1000 is 
enough, in this bidimensional example, for obtaining good performance. Moreover, O-MCMC with SMH presents an 
anomalous behavior when the variance of the vertical proposal pdfs is cr — 2. In this specific case, i.e., only for cr - 2, 
the MSE seems increases with N. However, note that O-MCMC provides the lower MSE, in any cases, comparing 
with the same computational effort E r (with the exception of O-MCMC with P-MTM and cr = 10). 

Comparison with a single MCMC chain. We test a single MH chain, i.e., N — 1, with a longer length T of the chain, 
in order to perform the same number of evaluation of the target Ej- Note that, in this case, Ej — T. Furthermore, 
we test the adaptive MH method (A-MH) [53] and the delayed rejection MH method (DR-MH) [47], For A-MH, we 
consider 10% of the total iterations as a training period (before adaptive the covariance matrix of the proposal). In 
DR-MH, we consider at most 3 acceptance test before deciding the next state of the chain. At the f-th iteration, in 
each acceptance test of DR-MH, we use a Gaussian proposal pdf with mean the average between the previous mean 
value and the point rejected at the previous test (at the first stage, the proposal pdf has the current state x, as mean). 
Since in DR-MH we can have more than one evaluation of the target at each iteration (at most 3), and since we fix the 


5 We set Ti ra [ n = Ty, i.e., the adaptation starts after that the samples of the first vertical period are collected. Thus, before of the first horizontal 
step, (p t (x ) has been already updated. However, an estimation of the “bum-in” period could be used for automatically tuning T tra i n [43]. 

6 We use all the generated samples in the estimation without removing any “bum-in” period. 
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total number of evaluations Ej, in general the total number of iterations T is random and varies at each run. Again 
we set xo ~ *£/([—4,4] x [-4,4]), randomly chosen at each independent mn, for each method. The results in Tables 8 
and 10 show that in this example the use of parallel chains is more convenient in terms of performance. Namely, IPCs 
and O-MCMC provide a smaller MSE than a single-longer MCMC chains. 

Comparison with Population Monte Carlo (PMC). We also compare with the standard PMC technique [13], de¬ 
scribed in Appendix B. We use N e [100,500,2000} and T = 2000 for PMC, so that the total number of evaluations 
of the target is Ej -NT e [2 -10 5 ,10 ■ 10 5 ,40 • 10 5 }. The proposal pdfs used in PMC are the same that we apply for the 
vertical chains in O-MCMC, i.e., c/ H (x\x nJ |) = Af(x;x„ jf _i, C„) using again the same covariance matrix, C„ = cr 2 I 2 , 
for n = 1,..., N and cr e [2,5,10,70} (again x„ 0 ~ "Z/([-4,4] X [-4,4]) for n = 1,... ,1V). We have considered a 
higher number of Ej for PMC with respect to O-MCMC, since O-MCMC involves several acceptance tests which 
are not contained in PMC. Thus, in order to provide a comparison as fair as possible, we allow a greater number 
of evaluations of the target, Ej, for PMC. Table 10 shows the MSE (mean of the MSEs of each component) of the 
O-MCMC schemes and the PMC method, for estimating £[X], We can see that the O-MCMC schemes, even with 
less E t , provide lower MSEs with the exception of the cases corresponding to cr = 10. 

Comparison with an adaptive SMC scheme. Finally, we compare with the SMC scheme described in Appendix B.2, 
where N parallel MCMC chains, generating the population [x„ jf }^ =1 , and the interaction is performed by a resampling 
step, after drawing N samples {x„ 2+ i }' v =| , each one from x„, /+ i ~ (p„(x\x , u , A,) (we set Ty = 1). The resampling plays a 
role similar to the orthogonal steps in O-MCMC. Thus, for providing the fairest comparison as possible, we also con¬ 
sider here and adaptive covariance matrix A, = ^ £' = 1 (x nj - - ft,)(x nJ - ft,) T + A 0 , where ft, = ^ £' =1 ^=\ *nj 
and Ao = 4I 2 . Also, in this case, we have x„ q ~ Tl([- 4,4] X [-4,4]) for n = 1,..., N, as for O-MCMC. Note that at 
each f-th iteration, 1 - 1 ,,T, the resample-move SMC scheme performs a multinomial resampling with cardinality 
N and then one step of N parallel MCMC chains. As a consequence, the total number evaluations of the target is 
Ej - 2NT, recalling that we set T v - 1 (see App. B.2). The results shown in Table 10. In general, O-MCMC 
outperforms SMC, considering a similar number Ej of target evaluations. For further comparison between O-MCMC 
and SMC see Section 8.3. 

Computational times (in seconds) are also provided in Table ll . 7 We can observe that, with a Matlab implemen¬ 
tation, O-MCMC is also competitive in terms of computational time. Due to the efficient matrix operations (at least 
with a Matlab implementation), the use of parallel chains is always more convenient in terms of computational time 
than the use of a single-longer chain (given a fixed number Ej of target evaluations). However, in a specific scenario, 
a single chain with a longer run could perform better than shorter parallel chains. In this highly multimodal example 
the use of IPCs is more appropriate (see Tables 8 - 9). 



O-MCMC with SMH \ 

Independent parallel chains (IPCs) | 

Single MH chain 

N 

5 

100 

1000 I 

_ 5 _ 

100 

1000 | 

1 

Tv 

1 

100 

1 

100 

1 

100 

J _ 1 _ l 

T 

cr = 2 

1.4881 

2.3649 

1.7515 

2.9146 

5.6803 

6.1354 

28.7856 

8.2925 

7.3543 

89.6476 

87.1911 

41.6461 

cr = 5 

1.4989 

2.1724 

1.4512 

1.7089 

1.3606 

1.4825 

13.0602 

2.2842 

1.8373 

47.7092 

5.8160 

0.6027 

cr = 10 

1.1769 

1.4034 

0.1062 

0.1129 

0.0142 

0.0139 

2.4443 

0.1247 

0.0128 

2.6611 

0.1397 

0.0274 

<x = 70 

1.8175 

2.0730 

0.3554 

0.3483 

0.2866 

0.2815 

5.4897 

0.5469 

0.3264 

7.5976 

0.5103 

0.3271 

T 

4000 j 

2400 

2020 

2002 

12 - 10 3 

2.02 ■ 10 5 

20.02 ■ 10 5 

Ej 

12 ■ 

10 J 

2.02 

• 10 3 

20.02 

■ 10 3 | 

12 - 10 j 

2.02 ■ 10 3 

20.02 ■ 10 3 

12 - 10 3 

2.02 ■ 10 3 

20.02 ■ 10 3 


Table 8. Mean Square Error (MSE) in the estimation of the mean of the target, using O-MCMC with SMH and IPCs, considering different values 
of cr and Ty (recall, we set Tv = 7//). The total number of evaluations of the target Ej is the same for O-MCMC (where Ej = ^(N + 1) since 
Ty = Th ) and IPCs (where Ej = NT). Note that Ej = T for the single MH chain. 


7 In order to provide the computational times, the methods are tested in a Laptop-Mac-Processor 1.7 GHz-8 GB-1600 MHz-DDR3. A prelimi¬ 
nary Matlab code of O-MCMC-SMH is also provided at http: //www. mathworks. com/matlabcentral/f ileexchange/58207- omcmc- smh? 
s_tid=srchtitle (note that this code is not optimized). 
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O-MCMC with P-MTM j 

j IPCs 

Single MH chain 

Single A-MH 

Single DR-MH 

N 


5 

50 | 

i 5 

50 

1 

1 

1 

Tv 

I 

! r 

T 

T 

T 

L 

5 

50 

5 

50 

- 

- 

~ 

- 

- 

~ 

3 

cr = 2 

1.3907 

1.1421 

1.3156 

0.7678 

17.1352 

3.5950 

91.7211 

87.0583 

85.4229 

36.2071 

80.5092 

<r = 5 

1.6159 

0.9074 

1.4011 

1.0072 

12.5791 

2.3277 

37.9583 

10.3945 

7.1659 

2.9921 

6.5127 

<T = 10 

1.5738 

0.8634 

1.0982 

0.8379 

0.9403 

0.1134 

1.4694 

0.2697 

0.1375 

0.1287 

0.1385 

T 


4000 


f 4000 

2- 10 4 

11 ■ 10 4 

20- 10 4 

20 ■ 10 4 

T' : dep. on the amount 












of rejections at each step 

Ej 

2- 10 4 

11 • 10 4 

111 - 10 4 

20 ■ 10 4 

2 ■ 10 4 

20- 10 4 

2- 10 4 

11 • 10 4 

20- 10 4 

20- 10 4 

20- 10 4 


Table 9. Mean Square Error (MSE) in the estimation of the mean of the target, using O-MCMC with P-MTM and IPCs, considering different 
values of cr and Ty (recall, we set Ty = 7//). The total number of evaluations of the target Ej is the same for O-MCMC (where Ej = + L) 

since Ty = Th ) and IPCs (where Ej = NT). Note that Ej = T for the single MCMC chains. However, for DR-MH, we have Ej = T' where T' 
depends on the number of rejections occurred in the specific run. The maximum number of acceptance tests in DR-MH, at the same iteration, is set 
to 3 (this parameter plays a similar role of the number of tries L in the MTM schemes). 



O-MCMC with SMH 

O-MCMC with P-MTM | 

| PMC 1 

| adaptive SMC | 

N 

5 

100 

1000 

5 

50 

100 

500 

2000 

100 

200 

cr = 2 

1.4881 

1.7515 

5.6803 

1.3907 

1.3156 

48.11 

35.1772 

28.4326 

2.1114 

1.1252 

cr = 5 

1.4989 

1.4512 

1.3606 

1.6159 

1.4011 

2.5998 

2.3230 

1.9153 

2.0800 

1.1501 

cr= 10 

1.1769 

0.1062 

0.0142 

1.5738 

1.0982 

0.0512 

0.0141 

0.0054 

1.6845 

0.9873 

cr = 10 

1.8175 

0.3554 

0.2866 

2.0185 

1.8019 

2.3963 

0.8252 

0.1161 

1.8899 

0.9943 

T 

4000 [ 

| 2000 [ 

| 1000 j 

Ej 

12- 10 3 

2.02 ■ 10 5 

20.02 ■ 10 5 

2 ■ 10 4 

11 - 10 4 | 

| 2 - 10 s 

10 ■ 10 5 

40 ■ 10 5 

| 2 - 10 s 

4- 10 5 | 


Table 10. Mean Square Error (MSE) in the estimation of the mean of the target, using O-MCMC (Ty = Th = l and L = 5 for P-MTM) and the 
standard PMC method [13]. The total number of evaluations of the target is Ej = j(N+l) for O-MCMC-SMH, Ej = ^(N+L) for O-MCMC-SMH 
(since Ty = Th for both), Ej = NT for PMC, and Ej = 2NT for adaptive SMC. 


8.2. Spectral analysis: estimating the frequencies of a noisy multi-sinusoidal signal 

Many problems in science and engineering require dealing with a noisy multi-sinusoidal signal, whose general 
form is given by 

d x 

y c (r) = Aq + Aj cos(2 nfiT + 0,-) + r(r), r e R, 

i= 1 

where Aq is a constant term, d x is the number of sinusoids, {A,}^ is the set of amplitudes, {2nf}'j^ l are the frequencies, 
[0,]/^ their phases, and r(r) is an additive white Gaussian noise (AWGN) term. The estimation of the parameters of 
this signal is required by many applications in signal processing [55, 56], in control (where a multi-harmonic distur¬ 
bance is often encountered in industrial plants) [57, 58] or in digital communications (where multiple narrowband 
interferers can be roughly modeled as sinusoidal signals) [59, 60]. Let us assume that we have d y equispaced points 
from y c (r), obtained discretizing y c (j) with a period T s < maX[ ~- (in order to fulfill the sampling theorem [61]): 

d x 

y[£] = Aq + 2 Aj cos (Cljk + (pj) + r[k], k — 1 ,..., d y , 

i= 1 


where y[&] = y c (kT s ) for k = 0,1,..., d y — 1, £2, = 2nfT s for i - 1,..., d x , and r[k\ ~ N(0, <x la ). Our goal is applying 
the O-MCMC-type algorithms to provide an accurate estimate of the set of unknown frequencies, {Q,|fl| or merely 
. For keeping the notation of the rest of the work, we define the vector of possible frequencies as x e W Ix . Thus, 

given a fixed considering the hyper-rectangular domain D = [ 0, ^ | (it is straightforward to note the periodicity 
outside £)), and a uniform prior on 2), the posterior distribution given K data is 7 f(x) oc exp (-V(x)), where 


V(x\,...,x d J 
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E t = 10= j 

Method 

N = 100 

A = 500 

N = 1000 

N = 2000 

N = 3000 

N = 5000 

N= 10 4 

O-MCMC-SMH (T v = T„ =\) 
O-MCMC-SMH (T v = 10, T„ = 1) 

PMC 

Adaptive SMC 

8.14 (P4) 
2.33 (P3) 

1.02 (P2) 
0.84 (PI) 

2.16 (P4) 
0.53 (P3) 
0.52 (P2) 
0.33 (PI) 

1.21 (P4) 
0.29 (PI) 
0.60 (P3) 
0.37 (P2) 

0.67 (P3) 
0.21 (PI) 
0.87 (P4) 
0.48 (P2) 

0.54 (P2) 
0.12 (PI) 
1.22 (P4) 
0.63 (P3) 

0.39 (P2) 
0.11 (PI) 
1.81 (P4) 
0.93 (P3) 

0.26 (P2) 
0.09 (PI) 
3.27 (P4) 
1.64 (P3) 

Single MH chain (N = 1 and T = E r = KP) 
Single A-MH chain (N = 1 and T = Ej = 10 5 ) 
Single DR-MH chain (N = 1 and T = Ej = 10 5 ) 

83.43 (P5) 

83.90 (P6) 

88.26 (P7) 


Table 11. Computational time (sec.) of the different algorithms with a Matlab implementation, as function of V and keeping fixed Ej = 10 5 . The 
other parameters of the techniques vary in order to keep fixed Ej = 10 5 as V grows. We have also highlighted the ranking as PI, P2, P3.P4, P5, 
P6, and P7, where PI is the fastest method and P7 the slowest one. 


We have denoted Io(x) the indicator function such that I®(x) = 1 if x e D and I®(x) = 0 if x <t '£). Moreover, for 
the sake of simplicity, we have assumed also that S and cr 2 are known. Furthermore, we set Ao = 0, A,- = A = 1 and 
<pi = 0 . 8 Note that the problem is symmetric with respect to the hyperplane x\ = X 2 = ... = Xd x (and, in general, 
multimodal). Bidimensional examples of V(x) - log 7 r(x) are depicted in Figure 6 . We apply O-MCMC, comparing 
with IPCs, in two different types of experiments described briefly below. In all cases, we set x„ o ~ Tf(D) for 
n = 1 ,... ,N, Th = Ty = 1, and consider the proposals < 7 „(x|x„, f _i) = ,/V(x;x„ if _i,C„) with C„ = cr 2 Ij it n = 1 
for the vertical chains ( 1 ^ is the unit matrix of dimension d x x d x ). 

First experiment. We set f = [/] = 0.1,/a = 0.3] T and generate d v = 10 synthetic data from the model. Since in this 

case, d x — 2 and '£) — |(), 4 | , it is possible to approximate the posterior with a very thin grid and compute the first 
4 non-central moments and, as a consequence, we can compare the performance of different Monte Carlo sampling 
methods. Then, we test O-MCMC-SMH with the horizontal proposal <p(x) = N(x \//, A) where // = [0.25,0.25] T and 
A = cr 2 h, i.e., uses the same cr considered for the vertical chains (recall that C„ = cr 2 I 2 ). We set the total number 
of target evaluations E T — M(N + 1) e {2730,5450,10.9 • 10 3 }. For a fair comparison, we consider N independent 
parallel chains (IPCs) choosing T such that E’ T = NT is equal to E T , i.e., E’ T - E r (see Section 5.3). We test different 
values of cr e [0.05,0.5] and N e {2,5,10). We test the combinations of number of chains N and epochs M (T for 
IPCs) in order to keep fixed Ej. The Relative Error (RE) in the estimation, averaged over 500 independent runs, is 
shown in Figure 7. We can observe that O-MCMC (solid line) outperforms IPCs (dashed line) providing lower REs. 
The performance becomes similar as the computational effort Ej grows since the state space in the first experiment, 
£> = [ 0 , , is small enough for allowing an exhaustive exploration of D by independent chains. 

Second experiment. We test O-MCMC in higher dimension considering d x = 4, i.e., 1) = 1 0, ^ | . We fix f = [/] = 
0.1, / 2 = 0.2, / 3 = 0.3,/q = 0.4] T . In this experiment, we consider an optimization problem for finding the global 
mode of n with d y = 30 observations. With d y = 30 observations, the main mode is enough tight around f, so that we 
consider f as true localization of the global mode. For simplicity and for breaking the symmetry, we restrict the search 
to the simplex contained in D with vertices at [0,0,0,0] T , [y,0,0,0] T , [y, y,0,0] T , [ 5 , y,0] T and [j, j, y, j] T . 

We test O-MCMC-PMTM considering again Gaussian horizontal proposals in the mixture t//, with A = /l 2 I 4 for all 
n. We test A - 0.1 and A — cr (where cr is employed in the covariance matrices C„ = cr 2 I 4 of the vertical chains). 
Moreover, we test the adaptation of A, i.e.. A, = ^ Z„ V =, ( x n. j -/tXNjj ~EtY + A 0 , where n, = A Yl ]= \ Z,T=i x nj » 

Ao = 0.02I 4 , for all n — 1,..., N. We set N = 20 as number of chains, L — 20 as number of tries, and Ej « 8700 
as total number of evaluations of the target. For a fair comparison, we again consider N and T for IPCs such that 
E' t = NT is equal to Ej, i.e., E' T = Ej. The vertical proposal pdfs are the same than those for the IPCs scheme. 
Furthermore, we apply a data-tempering approach [54] described in Section 6 , employing a sequence of 29 target pdfs 
iti each one considering an increasing number of observations K, = 2 + (i - 1) with i — 1,... ,29. The computational 
effort Ej is distributed uniformly in each 7ij. We compute the Relative Error (RE) of the last states of the N chains 
with respect to the true vector f. Figure 8 depicts the curves of the RE versus different values of cr e [0.05,0,5], We 
can observe that O-MCMC-PMTM always outperforms IPCs in this optimization problem. 


8 Let us remark that the estimation of all these parameters would make the inference harder, but can be easily incorporated into our algorithm. 
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Figure 6. Several examples of function V(x) = log7r(x) with d x = 2, given different realizations of the measurements y[l],. .. ,y[K]. In Figures 
(a)-(d)-(e)-(f), we set f = [g, ^] T and consider d y = 10,15,20,30 observations, respectively. In Figures (b)-(c), we set f = [0.1,0.3] T and d y = 10 
observations. Black dotted points shows all the states generated throughout an O-MCMC-PMTM run with N = 10, L = 10 and T = 500. The 
initial states are chosen uniformly within D = [0, |] T . 





(a) E t = 2730 (b) E r = 5450 (c) E T = 10900 


Figure 7. Relative Error (averaged over 500 runs) in the first experiment for O-MCMC-SMH (solid line) and IPCs (dashed line) with different com¬ 
putational effort Ej. Note that O-MCMC always outperforms IPCs. Note also that their performance becomes similar as the overall computational 
cost Et grows (due to the small size of the state space, = [o, ^ j ). 


Third experiment. Now we test O-MCMC in many different dimensions d x € {2,3,30}. The ground truth 
is the fiC-dimensional vector f = [/i,/ 2 , with f = i.e., equally spaced within the interval [0 0.5]. 

The problem is again finding the global mode of the target n, which in this case includes d y = 200 observations. We 
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(a) /l = 0.1 


(b) A = cr 


(c) adapting A, 


Figure 8. Relative Error (averaged over 500 runs) in the second experiment (dimension d x = 4) searching the global maximum with O-MCMC 
(solid line) and IPCs (dashed line). In the O-MCMC-PMTM scheme, we use different A = 0.1 and A = cr, for the covariance matrices A n = A 2 14 of 
the horizontal proposal pdfs in the mixture if/. Moreover, we test an adapted covariance matrices for the horizontal proposal pdfs (see Figure (c)). 




Figure 9. MSE and relative Error (averaged over 100 runs) as function of the dimension d x , in the third experiment (with d y = 200 observations) 
for O-MCMC-SMH (solid line) and IPCs (dashed line) with the same computational effort Ej. Note that the performance degrades when the 
dimension is increased, but O-MCMC always outperforms IPCs. 


consider f as true localization of the global mode for similar aforementioned reasons. The search is now restricted 
to the subset of R rf ' where the dimensions of x are decreasingly sorted. Now the proposed O-MCMC-PMTM uses 
Gaussian horizontal proposals in the mixture ij/, with A = A 2 l d} for all n. Suggested by the second experiment, we set 
/I = 0.5 for the horizontal steps, and cr = 0.25 for the vertical chains. We set N - 50 as number of chains, L - 10 as 
number of tries, and £Y ~ 47040 as total number of evaluations of the target. For a fair comparison, we again consider 
N and T for IPCs such that E' r = NT is equal to E r . i.e., E' T = E r . The vertical proposal pdfs are the same than those 
for the IPCs scheme. The data-tempering approach of [54] is implemented with a sequence of 7 target pdfs n, each 
one considering an increasing number of observations \dy \ ..., d ( y ] | = [2,5,10,20,50,100,200] with i = 1,..., 7. We 
compute the MSE (adding the MSE of every dimension) and the Relative Error (RE) of the last states of the N = 50 
chains with respect to the true vector f. Figure 9 depicts the curves of the MSE and RE versus different values of 
cr e [0.05,0,5]. Note that the performance degrades when the dimension is increased, but O-MCMC again always 
outperforms IPCs. 

8.3. Localization in a Wireless Sensor Network 

In this section, we address the problem of positioning a static target in the two-dimensional space of a wireless 
sensor network using only range measurements. More specifically, we consider a random vector X = [X\,X 2 ] T to 
denote the target’s position in the R 2 plane. The position is then a specific realization x. The measurements are 
obtained from 6 range sensors located at hj = [1, - 8 ] T , h 2 = [ 8 ,10] T , h 3 = [-15, -7] T , I 14 = [- 8 ,1] T , h 5 = [10,0] T 
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and h ( , = [0,10] T . The measurement equations are 

Yj, r — - 201 og(||x - h y || 2 ) + &j, j = 1 ,... , 6 , r - ,.,d y , (17) 

where ®j ~ N(0j\O, w 2 I), with a>j - 5 for all j e 1,..., 6 . Note that the total number of data is 6 d y . We consider a 
vague Gaussian prior pdf with mean [0,0] T and covariance matrix [ut^ 0; 0 o>q] t with ojq = 10, 

We simulate 6 d y = 360 measurements from the model (d y = 60 observations from each sensor), fixing x\ =3.5 and 
xi = 3.5. Our goal is to compute the expected value of the posterior 7 r(x|y), using different Monte Carlo techniques. 
Since we consider a fixed sequence of observations, for comparing the performance of the different methods, we first 
approximate the expected value of 7 r(x|y) using an extremely thin grid obtaining £[X] « [3.415,3.539] T (so that we 
compare the Monte Carlo approximation with these true values). 

We compare O-MCMC-SMH with SMC in App. B.2 both with adaptation covariance matrix of the proposal in 
the “horizontal” step (i.e., used in the resampling in SMC), as suggested in Section 5.5. In both cases, we consider 
MH vertical kernels, with g„(x|x„,,_i) = ,/V(x;x„ if _i, C„) as proposal pdfs, using the same isotropic covariance matrix, 
C„ = <x 2 I 2 , for all n - 1,...,A, and cr e {1,2}. As horizontal proposal in O-MCMC-SMH we use a Gaussian pdf, 
ip,(x) = A/Yx;//„ A,) where//, and A, are adapted online,//, = A £' =| £„ V =i x nJ , and A, = ^ £' =1 YH=i (x„, 2 -/i,)(x„j- 
//,) T + Ao, where //q = [0,0] T , A 0 = A 2 I 2 with A = 0.2, and T train = T v . The “horizontal” proposals in SMC are 
<^„(x|x, a , A,) = Af(x;x„.,, A,), for n = 1,..., A, and A, = A £' =| £f =1 (x nj - - //,)(x„, ; - - //,) T + A 0 is adapted as in 
O-MCMC-SMH. 

We set Th — 1 and number of epochs M = 100, for both algorithms. Then, we test different values of A parallel 
chains, and vertical steps 7V.The total number of evolution of the target is Ej = M(NT V + Th) = 100(A7V + 1) for 
O-MCMC-SMH and Ej - 100A(7V + 1) for SMC (see Appendix B.2). We repeat the experiments 200 times (with 
independent runs) and average the results. At each run, the initial states are chosen randomly, x„ q ~ TT ([-1 0,10] X 
[-10,10]) for n = 1,..., N. Table 12 gives the MSE in estimation of the expected value of the posterior, with the 
different methods. The results in Table 12 confirm that O-MCMC outperforms SMC even with less computational 
cost (a smaller £Y). This shows that the advantage of replacing the resampling procedure with an orthogonal MCMC 
technique like SMH, in this case. 



O-MCMC with SMH 

adaptive SMC | 

N 

2 

5 

10 

J_5_1 

10 

100 | 

Tv 

2 

20 

2 

20 

2 

20 

2 

20 

2 

20 

2 

20 

cr = 1 

0.1507 

0.0021 

0.0923 

0.0008 

0.0522 

0.0003 

9.2133 

0.0233 

9.8118 

0.0065 

4.0167 

0.0022 

<x = 2 

0.1546 

0.0029 

0.0951 

0.0011 

0.0536 

0.0004 

3.8216 

0.0684 

3.6501 

0.0593 

1.1419 

0.0370 

Ej 

500 

4100 

1100 

10100 

2100 

20100 

1500 

10500 

3000 

21000 

30000 

21 ■ 10 4 


Table 12. Mean Square Error (MSE) in the estimation of the target position, using O-MCMC-SMH (Th = 1) and SMC. The total number of 
evaluations of the target is Ej = 100(/V7V + 1) for O-MCMC-SMH, whereas Ej = 100N(Ty + 1) for adaptive SMC. 


9. Conclusions 

In this work, we have introduced a novel family of MCMC algorithms, named Orthogonal MCMC schemes, 
that incorporates “horizontal” MCMC transitions to share information among a cloud of parallel “vertical” MCMC 
chains. We have described different alternatives for exchanging information among independent parallel chains. 
Compared to the fully independent parallel chains approach, the novel interacting techniques show a more robust 
behavior with respect to the parameterization and better performance for different number of chains. One reason of 
this behavior is that the novel algorithms provide a good trade-off between the use of an independent and a random 
walk proposal density, i.e., between local an global exploration. We have considered two different approaches for 
the interaction among the chains: in the first one, an MCMC technique over the entire population is directly applied, 
whereas in the second one, the initial population P t is used for building a suitable mixture density i//(x) employed 
as proposal function in the horizontal transitions. This second approach can be interpreted as an adaptive MCMC 
scheme, where the location parameters of the N components of the mixture i//(x) are driven by N parallel MCMC 
chains. The outputs of these parallel chains are also employed in the approximation of the target. Furthermore, we 
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have designed different parallel Multiple Try Metropolis (P-MTM) schemes using an independent proposal pdf, where 
the generated candidates are recycled in order to reduce the overall computational cost. Finally, we have described 
two modified versions of O-MCMC for optimization and inference in big data problems. The ergodicity of all the 
proposed methodologies has been proved and several numerical simulations have been provided in order to show the 
advantages of the novel approach. 

In future works we plan to address the development of parallel and data-distributed implementations of O-MCMC 
algorithms, considering the use of strategies for monitoring the convergence of the vertical chains, and tempered 
versions for sampling from high-dimensional and multi-modal targets. 
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A. Stationary distribution of O-MCMC 

In this section, we prove the ergodicity of the proposed schemes. First of all, we study the mixture-based approach 
introduced in Section 5.2, and then the population-based approach described in Section 5.1, within O-MCMC. 


A.l. Analysis for the mixture-based approach 

Let us consider two MCMC kernels, ^f, ( , V) (yl x ) an d K ( n H> (z\y) with x, y, z e D e K rf ', corresponding to the n-th chain 
for the vertical and horizontal steps, respectively. We assume tt(-) is the invariant density of both chains. Namely, we 
consider MCMC techniques whose steps are summarized in the two conditional probabilities, K ( ,P(y\x) and K ( „ H) ( z|y), 
such that 


f ^, v, (ylx)7t(s)dx 
Jo 

f K ( n H> (z\y)n(y)dy 
Jo 


n(y\ 

7 t(z). 


For the sake of simplicity, we tackle a simpler case where Kj, v> , are used sequentially, once each (i.e., Ty = 1 
and T h = 1). Namely, we consider the sequential application of K ( n v> and K\ H) , i.e, first draw y' ~ (y|x) and then 

draw z' ~ K„ (z|y'). The transition probability from z to x is given by 

T(z|x) = f K ( n H fz\y)K ( n v) (y\x)dy. (18) 

Jo 

The target n is also invariant w.r.t. T(z|x) [38, Chapter 1], Indeed, we can write 


r 

Jo 


T (z\x)7r(x)dx = 


= f f Kl H \z\y)K ( P(y\x)dy 
Jd Jd 

- f K™(z |y) f K^(ylx)7r(x)dx 
Jd Jd 

= f K ( n H) (z\y)n(y)dy, 

Jo 


n(x)dx, 


dy. 


= tr(z). 


(19) 


which is precisely the definition of invariant pdf associated to T(z|x). Clearly, this argument is valid for each n — 
1... ,1V, and can be easily extended for the product of more than two kernels (i.e., for any Ty,Tn < °o). 


A.2. Analysis for the population-based approach 

Considering now an extended state space, R. dxXN , we can interpret that O-MCMC yields a unique chain in R. dxXN . 
Namely, one population of states at the f-th iteration represents one extended state of this unique chain. Here, we 
show that this chain, generated by O-MCMC, has the extended target density 

N 

7t g (x U ...,X N ) oc n n (x„), 

n= 1 

as invariant pdf. We can use similar arguments to those employed previously, considering now a population of current 
states, i.e., 

V t -1 = {x^.-.-.x^,). 
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We denote the vertical MCMC kernels as K^\x n! ,|x„ jf _i) with n as invariant pdf, whereas K iHt (P,\P, \) denotes the 
horizontal kernel 9 with invariant pdf the aforementioned extended target pdf, 

N 

TtgiP) oc 7T g (Xi, . . . , X N ) = Y\ n<X n)- 
n= 1 

Thus, in this case the complete kernel of the O-MCMC procedure formed by one vertical and one orthogonal step, is 


T(P,\P,-2) 


f 

Jd n 


K m (P,\P t . i) 


^ K^\x n ^ X |x„ jf _ 2 ) Y\ dx «, 

n= 1 n=l 


i,t -1 • 


In this case, we can write 


Jo" 


T(P t \P t _ 2 )n,AP t _ 2 )dP t _ 2 = 


- f K (H \p t \p,_ i) r n nfi n dx nt _ i, 

[do N \ n=l „ =l ) n=l „=i 

X N N 

K (H) {P t \P t .x) P| 7T(X„,,_ 1 ) P| dx nJ - X 

n= 1 n=l 

= f K (H \P t IP t -i)7tg(P,-i)dP,-i - ;r g (n). 


Namely, the kernel 'r(P,\P, . 2 ) has n H as invariant density. Once more, this result can be easily extended when T v 
vertical and Tu horizontal transitions are applied by using the same arguments. Note that the generated parallel chains 
preserve the pdf n as invariant pdf, as shown previously, but in general is not reversible [38, Section 1.12.7], 


B. Population Monte Carlo (PMC), Sequential Monte Carlo (SMC) and distribution after resampling 

Resampling procedures are employed in different Monte Carlo techniques such as Population Monte Carlo (PMC), 
Iterated Batch Importance Sampler (IBIS) and, more generally, in Sequential Monte Carlo (SMC) methods for a static 
scenario [13, 54, 15], 


B.l. Standard PMC 

For simplicity, let us consider here a standard PMC-type scheme. In PMC, N different proposal pdfs q\,,q\< 
are employed at each iteration. Starting from [xj o, ■ • ■, x,v,o), the basic PMC scheme consists of the following steps: 

1. For t - 1,.. ,,T: 

(a) For n = 1,.. ,,N: 

i. Propagation: Draw one sample x„ jf from q n , i.e., 


x n.t ~ dn ] ), 


ii. 

iii. 


Weighting: Assign the unnormalized weight w, ht — —^ and store the pair [x„,,, w, h ,}. 
Resampling: Draw N independent samples {z\,..., z A /[ such that each z„ e !x lf , ,,., x,v. ( ) for n = 


1,A, with probability 


Pn = 


W„,t 

2*= 1 w k,t 


y N rr(Xkj) 

4jA-= 1 ft(x t , r |x fc ,-i) 


( 20 ) 


iv. Set x„, f = z„. 

2. Use all the pairs {x n ,,w„ tt }^f =l in order to build a unique IS estimator (normalizing jointly the weights w nJ ). 


The step 2(b) corresponds to resample (with replacement) N times the population [x„ ,}^ =1 . Note that the weights in 
Eq. (20) are the same used in (10). 


'Tor the sake of simplicity, we abuse of the notation using here the set V, as a vector. Moreover, we assume 7V = 1 and / // I. 
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B.2. A Sequential Monte Carlo sampler 

A generic Sequential Monte Carlo method for a static inference scenario have been exhaustively described in [15]. 
For facilitating the comparison with O-MCMC, we describe a specific SMC scheme (also known as “resample-move” 
scheme [62]) without considering a sequence of tempered target pdfs, but always the true target, i.e, ft. Below, we 
describe a specific SMC technique which belongs to this wide class, due to its connection to the O-MCMC framework. 
Starting from the population {x 10 , • •., x ;V .o), this specific SMC scheme for a static scenario consists of the following 
steps: 

1. For t = 

(a) For n= 1,..., N: 

i. Propagation: draw one sample x„ f from q n , i.e., 


X«,r ~ r/,, (x|X w? £_1 ), 


ii. 

iii. 


Weighting: Assign the unnormalized weight w, ut — ^ and store the pair {x nt , w„ jf ). 

Resampling: Draw N independent samples {zi,...,zjv} such that each z„ e jx] ,,..., x,v./] for n — 


1,..., N, with probability 


Pn = 


W„, t 

yi N 

Tik= 1 W k,t 


y N MXkj) 
2->k=l q k (x tJ |x t ,,_i) 


(21) 


iv. Moving: Apply one step of A parallel MCMC chains (with invariant target n), starting from [zj,..., z ;V ) 
and obtaining the new population {xi,,..., x^,}. 


Above we have considered only 7V = 1 step for each “vertical” MCMC moves. However, it can be used T v different 
steps as well, as in the numerical example in Section 8.3. In this case, the total number of evaluations of the target is 
E T — TN{ 1 + TV). Finally, observe that the resampling plays a similar role to the orthogonal step in O-MCMC. 


B.3. Distribution after resampling 

For the sake of simplicity, since we consider a generic iteration t, let us simplify the notation, denoting x„ = 
x nt ~ q„(x |x„ jf _i) (1 < n < N, 1 < t < T), and q n (x) = q„(x |x„ r _i). Moreover, we consider the following simplified 
procedure: 

1. For n - 1,... ,1V, draw x n ~ q„(x) and compute the weights [3 n in Eq. (20). 

2. Resample one sample z e [xi,..., x ;V ) according to the probabilities fi„, n = 1,..., N. 

In this section, we write the density <p(z) where z is obtained by the procedure above. We define as 

m-.« = [xi,...,x n _ 1 ,x B+ i,...,x A r], 


the matrix containing all the samples except for the n-th. Let us also denote as z e {x | ..., x, v ), a generic sample after 
applying one multinomial resampling step. Hence, the distribution of z is given by 

<t>( z) = f ir w (z|xi,...,x;v) 

Jd n 


j J tjnixf) 


dx\ ... dxh 


( 22 ) 


^(zlxt,. ■ • ,Xat) = 'YjPA'z - x 7 ), (23) 

7=1 

and fij are given in Eq. (20). Note that, by using the notation 7r' Vl (z|X| ,..., x N ) we have emphasized the depen¬ 
dence on the generated samples x„’s in order to facilitate the understanding of Eq. (22). After some straightforward 
rearrangements, Eq. (22) can be rewritten as 


0(z) 


N 

I 

7=1 


f 

JOn-' 


7t(z) 


yN TT(X„) 

2-in=l q n (x n ) 
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N 

> 

j J 

dm^j 

n— 1 
- ntj 

> 


(24) 
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Finally, we can write 



(25) 


where Z = ^ J2!=i q^lrj * s t ^ le est i mate of the normalizing constant of the target obtained by using the importance 
sampling technique. The equation above represents the density of a resampled particle. Clearly, for a finite value of 
N , there exists a discrepancy between <p(z) and 7r(z). 


C. Calderhead’s MCMC technique based on multiple candidates 


A similar approach to ensemble algorithm proposed in Table 5 has been proposed in [12], which is suggested as 
a general construction in order to parallelize a Metropolis-Hastings algorithm. The algorithm can be employed as 
orthogonal technique in O-MCMC and is outlined below: 

1. Set t = 1, and choose an initial state Xi. 

2. For m — 1,..., 

(a) Draw L candidates zi,.. ,,z L from q( z l5 ... ,z L \x t ) and set z L+ \ = x r . 

(b) Denoting with 

Z-k = [zi, • • ■ ,Zk-i,Zk+i ,..., Z/J, 

the matrix containing as column all the z’s with the exception of z k , Compute L+ 1 normalized weights 

n(z k )q(Z- k \z k ) 




2r=i n (ze)q(Z-(\Z{) 


(26) 


(c) Resample L times within the set [zi,..., z i+ ]) of L+ 1 elements, obtaining L samples x /+ i,..., accord¬ 
ing to the probabilities [if in Eq. (26), £ - 1,... ,L + 1. 

(d) Set t — t + L — mL. 

3. Return {x,}^. 

In this case, unlike in Table 5, the proposal density q depends on the previous state of the chain. Below, we also 
discuss the ergodicity of this technique where, for simplicity, we consider of resampling only once in step 2(c) and 
assume 

L 

q( zi,... ,z L |x,) = Y\ 

i=i 

Denoting as K(zk\x,) the kernel of the method and noting that q(z\, ..., Z/Jx,) = gfZ^+qlx,), we can write 

n(x,)K(z k \x,) = Ln(x t ) q(z u 

Jd l -' 


n(z k )q(Z- k \z k ) 

,z L \x t )——— - ———d Z- k , 


-n{x,)n{z k ) 


f 

Jo’-' 


E^i 1 n{z ( )q(Z. ( \Z() 

qrZ-(L + \)\zL+i)q{Z- k \z k ) 


Since z L+ \ - x t , then 


7T(X,)K( Z k \x t ) - -7T(X,)JT(z k ) 


X 


Z£}*(z e )q( Z_ ( \Z() 
q(Z- {L +i)\x,)q(Z^ k \z k ) 


dZ- k = 


D L -' Z«, 1+1 n{zt)q{Z-e\Z() + n(z k )q(Z- k \z k ) + n(x,)q(Z HL+l) \x r ) 


-dZ 


- n(z k )K(x t \z k ), 


which is the detailed balance condition (we have used that we can exchange the position of z k and x, without varing 
the expression above). 


27 



/ Digital Signal Processing 00 (2016) 1-31 


28 


D. Ergodicity of SMH 

Let us recall that we denote as P t \ - {xi jf _i,..., x W( _i |, the population of the states at the (t - l)-th iteration. 
A sufficient condition for proving the ergodicity of the chain, generated by SMH, is given by the detailed balance 
condition with respect to the extended target n g (xi,... ,xy) = njli Ir,-(x,). For the case P, + P t -\ (the case P, = P t ~\ 
is straightforward), the kernel of SMH can be expressed as 

KCPAV t -0 = Ay(x 0 ,,-i) 0 

2<=i rttipo 

where we have considered that the j-th state has been selected as a candidate for replacement and a is given by Eq. 
(6). Since je{ 1,..., A), for the interchangeability we have A equal probabilities (this is the reason of the factor A). 
Replacing the expression of a in Eq. (6), we obtain 


K(P, \P,-y) 


y(Xj,,-i) 


N<p(x 0 ,t- 1) 


y N l(x,.,-i) 
^i=l 7r(*i,t- 1) 


yN ifc-j-l) 


yJV Ax,j-1) 
L /-() 7r(x, , i) 


min 

0 <i<N 


l(Xj.,-l) ’ 
7r(x;,,-i) 


A 




*( X P- 1) £ 


N ¥>(x,i-i) • 

- min ' 


i)' 


=0 ;r(x,o <j< N 


Now, we can also write 


n g (P t ^)K(P,\P t _ i) 


j~[7t(x/,f-i) 

i=i 


A 


tt(x P - i) 


yN 


y(XQ.f-l)</?(Xj.,_i) 

y(x ' J - l) min 

i=0 7r(x; f jr(Xjj- 1) 


and defining yOP,-i,x 0i r-i) = £ 


N l(Xij-l) 

1=0 7r(X;i) 


min £ fx ''' 'j , we have 

0<i<N ^x,-.,-!) 


7t,(n-i)ACP,l?Vi) 


A 

Z 


N 

n ^ x/ 

i=lA 


,r-l, 


^(X 0 .r-i)^(x 7 ; f _i) 

y(!P f _i,x 0 ,,-i) 


where Z = J n n(x)dx. This expression above is symmetric w.r.t. x (t , t and x / r |. Since P, \ and P, differ only in the 
elementsxo,r-i andxy f _i (P,-\ containsx^_i whereas P, containsxo,,_i), then7f s (!P,_i)A('F,|!P f _i) = n g (P,)K(P t -i\P t ), 
which is precisely the detailed balance condition. 


E. Ergodicity of the parallel schemes based on multiple candidates 

Similarly as in PMC, the parallel Ensemble MCMC (P-EnM) and Multiple Try Metropolis (P-MTM) schemes in 
Tables 5-6 are based on the particle approximations of the measure of the target. In both cases, L independent samples 
zi,..., z l drawn from i/r(x), i.e., 

z e ~ if/(x), (27) 

for {= 1,... ,L. Below, we show that P-EnM and P-MTM yield reversible chains with stationary density the general¬ 
ized pdf n g , proving the detailed balance condition is satisfied [4], 

E.l. Parellel Multiple Try Metropolis 

In P-MTM, we can define the particle approximation based on the set {zi,..., // ), i.e., 

L 

7T (L) (z) = 'YjP(8( Z-Zf), 

t=\ 

28 


( 28 ) 
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where the normalized weights /^’s are given in Eq (10). Note that, the expression above coincides with Eq. (23). Let 
us also denote as the matrix 

Z-i k — [ Z 1 ,. . . , Z*_ 1 , Z k +\.... 5 1l] . 

containing all the samples z;’s with the exception of z k . We denote as K n (x nt \x nJ -\) is the MTM kernel of «-th chain, 
namely, K„( z|x) is the probability of the n-th chain of jumping from the state x = x,_i to z = z* e {zi,..., z j} (for 
simplicity, we consider here only the case z 4- x). Note that all the z/s are both drawn and resampled independently 
(see steps 2(a) and 2(b) in Table 6). Thus, the conditional probability K„( z|x) can be expressed as 


K n ( z = Zi-|x) = ^ K n (z k \x,k = €), 


(=i 


= L 


r l 

L\ n 


Hie) 


dZ^. 


for z 4 x, where the function a„ is given in Eq. (11) and we have considered the case x and z (the case, z = x is 
straightforward). The factor L is due of the exchangeability among the L random candidates. Thus, we can also write 


7T(x)K n (z k \x) = 
Ln(x)i//(z k ) 


= -n{x)n{z k ) 


' _t =0 

c l 

.rJoLt 


Hie) 


Hie) 


1 


Y L 7T(Zf) 
2jf=l ^(Zf) 




where we have also used the equality n(x) = hn(x). Replacing 


a„(x, z k \Z^ k ) = min 


1, 


Y L rr( Zf) 
2^=1 i/r(z f ) 


yL 7T(Zf) _ 7T(Z t ) ^tx) 
2jf=l l^(z f ) l/,(z k ) "*■ 


in the expression (29) and with some simple rearrangements, we obtain 

L 


n(x)K n (z k \x) = -n{x)n(z k ) 


f fj 

Jn L ' [ e= \ -U 


min 


yL 7T(Zf) 7T(Z t ) ' yL MZf) irfx) 

^t*k i/r(Zf) ^ lfl(Z k ) 4-iee=k l/,(Zf) |/<(X) 


£/Z_ 


We can observe that, in equation above, we can exchange the position of the variables x so that z k and the expression 
does not change. So that we can write 

7t(x)K n (z k \x) = n(z k )K n (x\z k ), (29) 

for all n = 1,... ,1V. The expression above is the so-called detailed balance condition [4]: since it holds for all n, the 
complete horizontal MTM process has 7i g as invariant pdf. 


E.1.1. Important observations and Block Independent MTM 

First of all, note that with respect to a standard parallel multiple try approach, the novel P-MTM scheme generates 
only L candidates at each iteration, instead of NL samples. Indeed, P-MTM “recycles” the samples z.\,... pz k from 
the independent proposal pdf i//(x ), using them in all the N chains. Namely, in P-MTM, at one iteration, the different 
MTM chains share the same set of tries. However, looking a single chain, each time L new samples are drawn from 
<//(x) so that the chain is driven exactly from a standard (valid) MTM kernel. Figures 10(a) and (b) compare graphically 
the standard parallel MTM approach and the P-MTM scheme (with N — 2 chains and L = 3 tries). Observe that, in 
Figure 10(a), 12 new evaluations of the target are needed whereas only 6, in Figure 10(b). 

Using the same arguments, the method remains valid if only one resampling step is performed at each iteration, 
providing one z*: in this case the same z* is tested in the different acceptance tests of the N parallel MTM chains, at 
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(a) Standard parallel MTM approach. 
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(b) The P-MTM scheme in Table 6. 



Block 



(d) The BI-MTM scheme in Table 13. 


Figure 10. A graphical representation of the several parallel MTM schemes with N = 2 chains and L = 3 tries. The BI-MTM scheme in (d) requires 
only 6 evaluations of the target pdf and 2 multinomial sampling steps considering two iterations, t— 1 and t. 


the same iteration (exactly as in Table 3 and Fig. 4 for MH kernels). Figure 10(c) shows this case. In order to reduce 
the possible loss of the diversity, since several chains could jump at the same new state z*, an alternative strategy 
can be employed: the Block Independent MTM (BI-MTM) algorithm described in Table 13. Since the proposal ijj is 
independent and then fixed, before a block of N transitions, we can draw NL tries from t//(xj. Then, we can divide 
them in N sets Sj, with j = 1,... ,N and select one sample from each set, obtaining {z^,... ,Zk N } with z*. e S r Then, 
we use N different permutations of {z*,,..., z/ f v } for performing N iterations of the N parallel chains, providing a better 
mixing with respect to the case in Figure 10(c). This strategy, i.e., the BI-MTM scheme, is perfectly equivalent to 
the previous one, shown in Figure 10(c), from a theoretical and computational point of view. BI-MTM is represented 
graphically in Figure 10(d). 


E.2. Parallel Ensemble MCMC 

Let us consider now the method in Table 5. In this case, the particle approximation is 

L 

^ a{6(z - Z {) + a L+ \6(z - x„ : ,_!) 

t=\ 

L +1 

^ atS(z - ze), (30) 

1=1 

where z^+i = x nJ i. In this case, for a given n = the conditional probability K, ,(z = z*|x), where x = x„, t 

and z* e [z u ■ ■. ,z L ,z L+ i = x^}, is given by 


£' i+1) (z) = 


^«(z*|x) = 




ftf +1 \z k ) dZ _ 


(31) 
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Table 13. Block Independent Multiple Try Metropolis (BI-MTM) algorithm for N parallel chains. 


1. Let N be the total number of parallel MTM chains and T // he the total number of iterations of each chain, such that j eN. 
Choose a number of tries L. sSet to = mTy + (m — 1)7’// if BI-MTM is used within O-MCMC. Otherwise, set to = 0. 

2. For each block b = 1, _ B = yf do: 

(a) Draw NL i.i.d. candidates z®,..., z® ~ <^(x), for /; = 1 

(b) Draw one sample z kh from each set Si, = .... z^' 1 [ for h - I./V. with probability 


P 


(A) _ 
e 


7 Hzf > ) 

t«zf) 


yL «*?'> ' 


Thus, finally we have TV different samples, {z&j,..., z^}, such that Zk h e for h = 1,..., A. 

(c) Create the circular permutations \ n j e {z^,..., z^} as defined in Eq. (12). 

(d) For t = (b - l)N + l + to,..., bN + to (i.e., exactly N transitions): 

i. Set j = t — (b— l)N - to (so that j — 1,..., N, in one block). 

ii. For n = 1,..., N: 

A. Set = \ n ,j, with probability 


a n (Xn,t-uVnj) = min 1, 


y L 

L e=i 


y L _ rriVnj) zr(x„, f -i) 

Zjf=l f(zf) j) 


Otherwise, set 
iii- Set!P, = (xi, x N , t ). 


for z ^ x. After some simple rearrangements (similarly in P-MTM) and using the formula of the weights in Eq. (8), 
we obtain 


n{x)K„{z k \x) = Ln(x)il/(z k ) 


n ^ (zf) 

J D> [ {= \ U 


* M 

<p(1k) 


= -n(x)n(z k ) 


c l 

n ^ 

' Y(=u*k 


yL 7T(Zf) 7T(X) 

Z>(=1 i/,(z r ) ,/,(x) 

1 


dZ^ k , 


yL ?r(Zf) , Jr(zt-) , £(x) 

<p( z f ) ^ <p(z t ) ^ <AM 


dZ- 


Observing the last equation, we can clearly replace the variable x with z k and vice versa, without changing the 
expression. Hence, finally we obtain 

n(x)K n (z k \x) = n(z k )K n (x\z k ), 

for all ?! = 1,,1V, that is the detailed balance condition. For further considerations, see App. E.1.1 above. 
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